{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "41a58112-8a18-49ad-9c8c-2088778613d0",
   "metadata": {},
   "source": [
    "# RAG Fusion Query Pipeline\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/query/rag_fusion_pipeline/rag_fusion_pipeline.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "This notebook shows how to implement RAG Fusion using the LlamaIndex Query Pipeline syntax."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c5bdbf5-d525-42e2-ab4a-19234c023491",
   "metadata": {},
   "source": [
    "## Setup / Load Data\n",
    "\n",
    "We load in the pg_essay.txt data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "20b91e78-d733-44dd-b9a2-7c6eab3aee3d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-01-09 21:50:55--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8003::154, 2606:50c0:8001::154, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 75042 (73K) [text/plain]\n",
      "Saving to: ‘pg_essay.txt’\n",
      "\n",
      "pg_essay.txt        100%[===================>]  73.28K  --.-KB/s    in 0.01s   \n",
      "\n",
      "2024-01-09 21:50:56 (5.83 MB/s) - ‘pg_essay.txt’ saved [75042/75042]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' -O pg_essay.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1785ffb1-5fc2-4855-854a-238003ff848b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import SimpleDirectoryReader\n",
    "\n",
    "reader = SimpleDirectoryReader(input_files=[\"pg_essay.txt\"])\n",
    "docs = reader.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3bc7134-a34b-41de-b7e6-c85a6e356b56",
   "metadata": {},
   "source": [
    "### [Optional] Setup Tracing\n",
    "\n",
    "We also setup tracing through Arize Phoenix to look at our outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d32462fb-07e0-4e5d-871c-98a782363330",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/jerryliu/Programming/llama-hub/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🌍 To view the Phoenix app in your browser, visit http://127.0.0.1:6006/\n",
      "📺 To view the Phoenix app in a notebook, run `px.active_session().view()`\n",
      "📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix\n"
     ]
    }
   ],
   "source": [
    "import phoenix as px\n",
    "\n",
    "px.launch_app()\n",
    "import llama_index\n",
    "\n",
    "llama_index.set_global_handler(\"arize_phoenix\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7c9299c-06d6-4710-afb0-3cc13761f358",
   "metadata": {},
   "source": [
    "## Setup Llama Pack\n",
    "\n",
    "Next we download the LlamaPack. All the code is in the downloaded directory - we encourage you to take a look to see the QueryPipeline syntax!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f5143a5f-7cd3-4aac-99e7-a357f3239f03",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Option 1: Use `download_llama_pack`\n",
    "# from llama_index.llama_pack import download_llama_pack\n",
    "\n",
    "# RAGFusionPipelinePack = download_llama_pack(\n",
    "#     \"RAGFusionPipelinePack\",\n",
    "#     \"./rag_fusion_pipeline_pack\",\n",
    "#     # leave the below line commented out if using the notebook on main\n",
    "#     # llama_hub_url=\"https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_query_pipeline_pack/llama_hub\"\n",
    "# )\n",
    "\n",
    "# Option 2: Import from llama_hub package\n",
    "from llama_hub.llama_packs.query.rag_fusion_pipeline.base import RAGFusionPipelinePack\n",
    "\n",
    "\n",
    "from llama_index.llms import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1e4786b6-6add-4caf-9f9c-c2d8f455702a",
   "metadata": {},
   "outputs": [],
   "source": [
    "pack = RAGFusionPipelinePack(docs, llm=OpenAI(model=\"gpt-3.5-turbo\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0a84473-0b2f-4649-be8e-7a603f526f02",
   "metadata": {},
   "source": [
    "## Inspecting the Code\n",
    "\n",
    "If we take a look at how it's setup (in your downloaded directory, you'll see the following code using our QueryPipeline syntax). \n",
    "\n",
    "`retrievers` is a dictionary mapping a chunk size to retrievers (chunk sizes: 128, 256, 512, 1024). \n",
    "\n",
    "```python\n",
    "# construct query pipeline\n",
    "p = QueryPipeline()\n",
    "module_dict = {\n",
    "    **self.retrievers,\n",
    "    \"input\": InputComponent(),\n",
    "    \"summarizer\": TreeSummarize(),\n",
    "    # NOTE: Join args\n",
    "    \"join\": ArgPackComponent(),\n",
    "    \"reranker\": rerank_component,\n",
    "}\n",
    "p.add_modules(module_dict)\n",
    "# add links from input to retriever (id'ed by chunk_size)\n",
    "for chunk_size in self.chunk_sizes:\n",
    "    p.add_link(\"input\", str(chunk_size))\n",
    "    p.add_link(str(chunk_size), \"join\", dest_key=str(chunk_size))\n",
    "p.add_link(\"join\", \"reranker\")\n",
    "p.add_link(\"input\", \"summarizer\", dest_key=\"query_str\")\n",
    "p.add_link(\"reranker\", \"summarizer\", dest_key=\"nodes\")\n",
    "```\n",
    "\n",
    "We visualize the DAG below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "34143d87-163f-4246-8ee5-15940d17c629",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "rag_dag.html\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "\n",
       "        <iframe\n",
       "            width=\"100%\"\n",
       "            height=\"600px\"\n",
       "            src=\"rag_dag.html\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "            \n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.lib.display.IFrame at 0x2afa52050>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyvis.network import Network\n",
    "\n",
    "net = Network(notebook=True, cdn_resources=\"in_line\", directed=True)\n",
    "net.from_nx(pack.query_pipeline.dag)\n",
    "net.show(\"rag_dag.html\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "065ef0c4-0c9e-4611-8de4-98b2a7fe3094",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author wrote short stories and tried writing programs on an IBM 1401 computer during their school days. They also mentioned painting still lives in their bedroom at night while they were a student at the Accademia.\n"
     ]
    }
   ],
   "source": [
    "response = pack.run(query=\"What did the author do growing up?\")\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ea4ef382-fd11-4a82-8f5d-b2c53a21c665",
   "metadata": {},
   "outputs": [],
   "source": [
    "# response.source_nodes"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_hub",
   "language": "python",
   "name": "llama_hub"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
